Method and apparatus for automatically processing a user&#39;s communication

ABSTRACT

The invention concerns a method and apparatus for processing a user&#39;s communication. The invention may include receiving a list of recognized symbol strings of one or more recognized entries. The list of recognized symbol strings may include a first similarity score associated with each recognized entry. From each recognized symbol string one or more contiguous sequences of N-symbols may be extracted. One of the extracted contiguous sequences of N-symbols may be matched with at least one stored contiguous sequence of N-symbols from a first database. A preliminary set of symbol strings and associated second similarity scores may be generated. The preliminary set of symbol strings may include one or more stored symbol strings from a second database that correspond to the at least one matched contiguous sequence of N-symbols. A third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings may be computed. A refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score may be output.

FIELD OF THE INVENTION

[0001] The present invention relates to automatically processing auser's communication. In particular, the present invention relates tomethod and apparatus for automatically recognizing and/or processingpossibly erroneous or incomplete user's communication.

BACKGROUND OF THE INVENTION

[0002] In recent years, automated attendants have become very popular.Many individuals or organizations use automated attendants toautomatically provide information to callers or to route incoming calls.Typically, a user places a call and reaches an automated attendant (e.g.an Interactive Voice Recognition (IVR) system) that prompts the user fordesired information and searches an informational database for therequested information. The user enters the request, for example, a nameof a business or individual via a keyboard, keypad or spoken inputs. Theautomated attendant searches for a match based on the user's input andoutputs a result if a match can be found.

[0003] However, in these conventional systems, if the automatedattendant is unable to find a suitable match in its database or if theuser is unsatisfied with the results, the user is connected to anoperator for further assistance. This process can be time consuming anda user may become frustrated if he or she does not have the exact nameof the business for which the additional information, such as atelephone number, is desired. In other words, if the user has a partialor erroneous name of the business, then the user may not be able toquickly find the desired information or may not find the requestedinformation at all. This scenario results in either wasted time or alost business opportunity for the both the user and the intendedbusiness.

SUMMARY OF THE INVENTION

[0004] The invention concerns a method and apparatus for processing auser's communication. The invention may include receiving a list ofrecognized symbol strings of one or more recognized entries. The list ofrecognized symbol strings may include a first similarity scoreassociated with each recognized entry. From each recognized symbolstring one or more contiguous sequences of N-symbols may be extracted.At least one extracted contiguous sequence of N-symbols may be matchedwith at least one stored contiguous sequence of N-symbols from a firstdatabase. Based on those matched N-symbols and first similarity scores,a preliminary set of symbol strings and associated second similarityscores may be generated. The preliminary set of symbol strings mayinclude one or more stored symbol strings from a second database thatcontain one or more matched contiguous sequences of N-symbols. A thirdsimilarity score associated with the one or more stored symbol stringsincluded in the preliminary set of symbol strings may be computed. Arefined set of symbol strings from the preliminary set of symbol stringsbased on the computed third similarity score may be output.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The invention is described in detail with reference to thefollowing drawings wherein like numerals reference like elements, andwherein:

[0006]FIG. 1 is an exemplary block diagram of a communication processingsystem in accordance with an embodiment of the present invention.

[0007]FIG. 2 is a detailed block diagram of a database entry matcher,shown in FIG. 1, in accordance with an exemplary embodiment of thepresent invention.

[0008]FIG. 3 is a flowchart illustrating an automatic input recognitionprocess in accordance with an exemplary embodiment of the presentinvention.

[0009]FIG. 4 illustrates the application of the automatic inputrecognition process to an exemplary input in accordance with anembodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0010] Embodiments of the present invention relate to a method andapparatus for automatically recognizing and/or processing a user'scommunication. In some cases, the user's communication may be erroneousor incomplete, but in other cases, the user's communication may becorrect or complete. In an exemplary embodiment, the invention mayutilize whole words, parts of words, or one or more contiguous sequencesof symbols or characters to search one or more databases for entriesthat best match the user's input communication. Based on the user'scommunication, embodiments of the invention may reduce the volume ofcomputations that may be required when processing, for example, a user'srequest for information. Accordingly, embodiments of the presentinvention may provide a more efficient and effective system forautomatically processing the user's request with minimal externalintervention.

[0011] In an exemplary embodiment of the present invention, one or morecontiguous sequences of listings N-symbols (listings N-grams) may beextracted from each entry in a listings database. The entries of thelistings database may be stored symbol string entries corresponding toinformation that may be desired by users.

[0012] In exemplary embodiments of the present invention, a user'scommunication is received and a ranked list of one or more recognizedsymbol string entries may be generated. Each recognized symbol stringentry in the list may be ranked according to a corresponding firstsimilarity score. One or more recognized contiguous sequence ofN-symbols (recognized N-grams) may be extracted from each recognizedsymbol string entry in the list. The extracted recognized N-grams foreach entry may be matched with the identical listings N-grams. In anembodiment of the present invention, for one or more database entriesmapped to the matched listings N-grams, a second similarity score may becalculated. In one embodiment, the second similarity score may be usedto generate a preliminary set of symbol strings that may include one ormore database entries mapped to the matched listings N-grams. Inembodiments of the present invention, the second similarity score can beused to further narrow the number of listings that need to be searchedto generate one or more entries that may be similar or equivalent to theuser's communication.

[0013] In an exemplary embodiment of the present invention, byevaluating the ranked list of recognized symbol strings with respect tothe preliminary set of symbol strings, a refined set of symbol stringsmay be generated. The refined set of symbol strings may be generatedbased on associated third similarity scores. In one embodiment of theinvention, entries of the refined list of symbol strings may include oneor more database entries that have a favorable third similarity score.

[0014] In exemplary embodiments of the present invention, the refinedset of symbol strings may be output to the user for selection. If aselection is made, corresponding information may be retrieved andpresented to the user. In further embodiments of the invention, furtherprocessing may occur by evaluating the refined set of symbol stringswith respect to the recognized symbol strings before an output ispresented to the user.

[0015]FIG. 1 is an exemplary block diagram of a communication processingsystem 100 for processing a user's communication in accordance with anembodiment of the present invention. A recognizer 120 is coupled to adatabase 110, an output manager 130, and a database entry matcher 140.

[0016] While the examples discussed in the embodiments of the patentconcern recognition of speech, the recognizer 120 may also receive auser's communication or inputs in the form of speech, text, digitalsignals, analog signals and/or any other forms of communications orcommunications signals. As used herein, user's communication can be auser's input in any form that represents, for example, a single word,multiple words, a single syllable, multiple syllables, a single phonemeand/or multiple phonemes. The user's communication may include a requestfor information, products, services and/or any other suitable requests.

[0017] A user's communication may be input via a communication devicesuch as a wired or wireless phone, a pager, a personal digitalassistant, a personal computer, and/or any other device capable ofsending and/or receiving communications. In embodiments of the presentinvention, the user's communication could be a search request to searchthe World Wide Web (WWW), a Local Area Network (LAN), and/or any otherprivate or public network for the desired information.

[0018] In embodiments of the present invention, the recognizer 120 maybe any type of recognizer known to those skilled in the art. In oneembodiment, the recognizer may be an automated speech recognizer (ASR)such as the type developed by Nuance Communications. The communicationprocessing system 100, where the recognizer 120 is an ASR, may operatesimilar to an IVR but includes the advantages of a database entrymatcher 140 in accordance with embodiments of the present invention. Inalternative embodiments of the present invention, the recognizer 120 canbe a text recognizer, optical character recognizer and/or another typeof recognizer or device that recognizes and/or processes a user'sinputs, and/or a device that receives a user's input, for example, akeyboard or a keypad. In embodiments of the present invention, therecognizer 120 may be incorporated within a personal computer, atelephone switch or telephone interface, and/or an Internet, Intranetand/or other type of server. In an alternative embodiment of the presentinvention, the recognizer 120 may include and/or may operate inconjunction with, for example, an Internet search engine that receivestext, speech, etc. from an Internet user. In this case, the recognizer120 may receive user's communication via an Internet connection andoperate in accordance with embodiments of the invention as describedherein.

[0019] In one embodiment of the present invention, the recognizer 120receives the user's communication and generates a list of rankedrecognized symbol strings using known methods. The symbol strings may betext or character strings that represent individual or business listingsand/or other information for which the user desires additionalinformation. In one example, the recognized symbol string may be thename of a business for which the user desires a telephone number. In anexemplary embodiment of the present invention, if a desirable match isnot found in the listings database 110, recognizer may generate a listof ranked symbol strings that are similar to the user's input. Eachsymbol string generated by the recognizer may be a hypothesis of whatwas originally input by the user. Each symbol string may be ranked byassociated first similarity or probability scores.

[0020] The database 110 may include a listings database that has storedsymbol strings or information entries (L(listings_(all))) that representinformation relating to a particular subject matter. For example, thelistings database may include residential, governmental, and/or businesslistings for a particular town, city, state, and/or country. It isrecognized that the stored symbol strings L(listings_(all)) couldrepresent or include a myriad of other types of information such asindividual directory information, specific business or vendorinformation, postal addresses, e-mail addresses, etc. In embodiments ofthe present invention, the database 110 can be part of larger databaseof listings information such as a database or other information resourcethat may be searched by, for example, any Internet search engine whenperforming a user's search request.

[0021] In embodiments of present invention, the database 110 may alsoinclude a recognizer grammar list generated from the symbol stringsL(listings_(all)) stored in the listings database. The recognizergrammar list may include, for example, a plurality different ways symbolstrings stored in the listings database 110 may be referred to by users.The recognizer 120 may generate the list of recognized symbol stringsand associated first similarity scores based on the recognizer grammarlist stored in the database 110.

[0022] In an embodiment of the present invention, the database entrymatcher 140 receives the stored symbol strings L(listings_(all)) and mayextract one or more contiguous sequence of N-symbols (listings N-gram)for each symbol string entry stored in the database 110 (i.e., eachentry of stored symbol strings L(listings_(all))). The database entrymatcher 140 also maps the listings N-gram with the corresponding symbolstring from which it was extracted. In one embodiment, for one or moreextracted listings N-grams a list of listings symbol strings containinga particular N-gram is stored as well as elementary second similarityscores. This list may be full, that is, it may include all listingssymbol strings from the database containing a particular N-gram.Alternatively the list may be short, that is, the list may include onlya part of all listings symbol strings from the database containing thatparticular N-gram. The elementary second similarity scores for aparticular N-gram may be different for different listings symbol stringscontaining that N-gram, or the elementary second similarity scores maybe the same for all listings symbol strings containing that N-gram. Theuse of full lists may significantly reduce the volume of computationsthat may be required to process a user's communication and present amore desirable search result. The use of short lists may further reducethe volume of computations. In alternative embodiments of the presentinvention, all of the listings N-grams may be mapped and stored withcorresponding stored symbol strings entries in a master list.

[0023] In an exemplary embodiment of the present invention, the databaseentry matcher 140 extracts one or more recognized contiguous sequence ofN-symbols (recognized N-grams) for each recognized symbol string entryin the list of recognized symbol strings generated by the recognizer120. The database entry matcher 140 may search all of the listingsN-grams to find an identical match for each of the recognized N-grams.

[0024] Based on the matched listings N-grams, the database entry matcher140 may generate a preliminary set of symbol strings based on associatedsecond similarity scores. Entries of the preliminary set of symbolstrings include the stored symbol strings mapped to the matched N-grams.

[0025] The database entry matcher 140 may compute a third similarityscore for each entry in the preliminary set of symbol strings. Based onthe third similarity score, the database entry matcher 140 may generatea refined set of symbol strings. The refined set of symbol strings maytypically include the best or closest match for the user's communicationoriginally received by the recognizer 120.

[0026] The database entry matcher 140 outputs the refined set of symbolstrings to the output manager 130 for processing. The output manager 130may forward the refined set of symbol strings to the user for selection.Based on the user's selection, the output manager 130 may route a callfor the user, retrieve and present additional information to the user,present another prompt to the user, terminate the call if the desiredresults have been achieved, or perform other steps to output a desiredresult for the user.

[0027] Now the database entry matcher 140 will be described in moredetail with reference to the exemplary block diagram shown in FIG. 2.The database entry matcher 140 may include for example, an N-gram mapgenerator 210, an N-gram database 240, a rough and fast matcher 220, anda refined matcher 230. The N-gram map generator is coupled to thedatabase 110, the N-gram database 240 and the rough and fast matcher220. The rough and fast matcher 220 is coupled to the recognizer 120 andthe refined matcher 230. The refined matcher 230 is also coupled to therecognizer 120 and to the output manager 130. The term coupled as usedherein implies direct or indirect coupling. The rough and fast matcher220 is further coupled to, for example, the N-gram database 240 and theoutput manager 130.

[0028] In embodiments of the present invention, the N-gram map generator210 may extract one or more listings N-grams from each of the listingsin the database 210. As indicated above, for one or more extractedlistings N-grams a list, for example, a full list and/or a short list oflistings symbol strings containing a particular N-gram along withcorresponding elementary second similarity scores may be stored, forexample, in the N-gram database 240. Additionally or optionally, thismapping of N-grams to, for example, full lists or short lists may bestored in database 110. The rough and fast matcher 220 may receive alist of recognized symbol strings with associated first similarityscores. The rough and fast matcher 220 may extract from each recognizedsymbol string one or more recognized N-grams. The rough and fast matcher220 may match at least one of the recognized N-grams with at least oneof the listings N-grams. The rough and fast matcher 220 may furthergenerate a preliminary set of symbol strings and associated secondsimilarity scores. The preliminary set of symbol strings may include oneor more listing from the database 210 that is mapped to the matchedN-grams. The refined matcher 230 may receive the preliminary set ofsymbol strings and may compute a third similarity score for each listingfrom the database 210. The refined matcher 230 may output a refined setof symbol strings from the preliminary set of symbol strings based onthe computed third similarity score.

[0029] Referring to exemplary FIG. 2, the function of each component ofthe database entry matcher 140 will be described in detail. During apre-processing stage, according to an exemplary embodiment of thepresent invention, the N-gram map generator 210 receives the pluralityof symbol strings L(listings_(all)) stored in the database 110. TheN-gram map generator 210 may extract one or more listings N-gram fromeach symbol string entry stored in the database 110. In other words,from each stored symbol string entry the N-gram map generator 210extracts contiguous sequences of N-symbols or N-characters where N canbe, for example, 1, 2, 3, 4, 5, or more symbols or characters in length.A symbol as used herein can be any character, sign, mark, figure orother representation in, for example, the English language or any otherlanguage.

[0030] For example, a symbol string entry such as “Park Flowers” wouldyield the following N-grams (g_(k) ^(w)) “(par”, “park”, “ark_”, rk_f”,“k_fl”, “_flo”, “flow”, “lowe”, “ower,” “wers”, and “ers)”. In thisexample, N equals “4” symbols, “(” denotes the start symbol, “_” denotesa space between words, and “)” denotes the end symbol. In other words,an imaginary window or box that is N (e.g., 4) symbols in length, forexample, is placed around the symbol string entry and is moved symbol bysymbol. The symbol “(” in “(par” may indicate that this is the firstN-gram of the corresponding symbol string. The symbol “_” may indicatethat the symbol string contains a space which implies that the symbolstring is more than one word. The symbol “)” in “ers)” may indicate thatthis is the last N-gram of the corresponding symbol string. It isrecognized that any symbol or character may be chosen to designate thebeginning, end, space or other characteristic of the symbol strings.

[0031] Additionally, in the above example, “k” designates a particularsymbol string entry and “w” designates the N-gram number for theparticular symbol string entry. Thus, in this case, the N-grams could bedesignated as g_(k) ¹, g_(k) ², . . . g_(k) ^(w), . . . , where k=1 maydesignate the symbol string entry “Park Flowers” and where w=1 maydesignate the first N-gram “(par” for the corresponding listing.

[0032] Each listings N-gram may be stored in the N-gram database 240 andmay be mapped with a list of listings symbol strings stored in thedatabase 110 containing this N-gram and with corresponding elementarysecond similarity scores. This list may be full, that is, it may includeall listings symbol strings from the database containing a particularN-gram. Alternatively it may be short, that is, include only a part ofall listings symbol strings from the database containing that particularN-gram. The elementary second similarity scores for a particular N-grammay be different for different listings symbol strings containing thatN-gram, or those scores may be the same for all listings symbol stringscontaining that N-gram. In one embodiment of the present invention, theN-gram map generator 210 may create one or more lists (e.g.,List(g)={listing₁, listing₂, . . . listing_(k)}) created from the storedsymbol strings L(listings_(all)). Each list List(g) may include allentries stored in the database 110 that contain a particular listingsN-gram g, or it may include only a part of such entries. List(g) may bea short list and/or a full list. For example, where N-gram g=“k_fl”; thelist List(“k_fl”)={“park flowers,” “greystone park florist,” “normandypark florist,” “shearman park florist”}. In another example, wherelistings N-gram g=“(par”; the short list List(“(par”)={“paratransitservices of nj”, “parent training”, “parfums jean jacques broussard”,“park flowers”, “parker family health clinic”, “parsons cappiello andnardelli”, “part makers inc”}.

[0033] Short lists that are subsets of full lists may be created in manyways. In one embodiment of the present invention, the number of elementsin short lists may be limited to some fixed number that may be the samefor all N-grams, for example, 200. Creating a short list for aparticular N-gram g may be controlled by listings symbol stringpriorities. A listings symbol string priority may be defined as theratio of the number of all N-grams that can be extracted from thislistings symbol string and the number of short lists in which thislistings symbol string has been included before the processing of thisparticular N-gram g. So all the listings symbol strings containing thisparticular N-gram g may be ordered according to their priorities, andthe top predefined number, for example 200, of them may be included inthe short list for this N-gram g. In embodiments of the presentinvention, the priorities of the included listings symbol strings may berecomputed. In additional embodiments of the present invention, thepriorities may be computed as some other function of the number of allN-grams that can be extracted from this listings symbol string and thenumber of short lists in which this listings symbol string has beenincluded before the processing of a particular N-gram g.

[0034] In embodiments of the present invention, each listings N-gram maybe mapped to its corresponding list List(g). Each list and correspondingmapping may be stored in the N-gram database 240. Lists List(g), bothfull and/or short, can significantly reduce the volume of computationsthat may be required to process a user's communication and present amore desirable search result. In alternative embodiments of the presentinvention, computations can be further reduced by creating sub-lists ofthe listings database that contain information related to a particularsubject matter. For example, restaurant listings may be placed into aseparate sub-list that can include further lists List(g) forcorresponding listings N-grams. Thus, if a user requests a restaurantlisting, only the sub-list for restaurants and corresponding lists maybe evaluated in accordance with embodiments of the present invention.

[0035] It is recognized that the description of full or short listsand/or sub-lists given above is given by way of example only, and thatone of ordinary skill in the art can employ a myriad of other methods ortechniques to process information or listings to achieve efficiencies inaccordance with embodiments of the present invention.

[0036] In one embodiment of the present invention, the N-gram mapgenerator 210 may not be case and/or punctuation sensitive so thatpunctuation may be removed and capital letters may be changed to lowercase letters when N-grams are extracted. In an alternative embodiment,the N-gram map generator 210 may be case and/or punctuation sensitive sothat punctuation and capital letters are retained when the N-grams areextracted.

[0037] For each listings N-gram, the N-gram map generator 210 may alsocalculate a listings N-gram frequency score M(g), where g indicates theparticular N-gram. In other words, M(g) may designate the number ofstored symbol strings L(listings_(all)) that contain, for example,N-gram g from the total number M of stored symbol stringsL(listings_(all)) in the database 110. The listings frequency score M(g)represents the total number of database listings in which the particularlistings N-gram g appears. For example, a listings database containingbusiness listings for Morristown, N.J. may contain 5,743 total entriesM. Then, listings frequency score M(“k_fl”) may equal 4 indicating that4 entries out of 5,743 total entries, for example, contain the N-gram“k_fl.” In another example, the frequency score M(“ion)”) may equal 116indicating that 116 out of 5,743 total listings contain the N-gram“ion)”. The frequency score may be stored in the N-gram database 240with the corresponding contiguous sequence of N-symbols and thecorresponding list of database listings.

[0038] Based on the listings N-gram frequency score M(g) and the totalnumber of database listings M, the N-gram map generator 210 may generatea listings N-gram frequency ratio R(g). The frequency ratio R(g) may bethe function of the number of the one or more symbol strings stored inthe database 110 that contain the listings N-gram g and the total numberof the one or more symbol strings L(listings_(all)) stored in thedatabase 110.

[0039] The frequency ratio R(g) may be calculated by dividing thefrequency score M(g) by the total number of database listings M (i.e.,R(g)=M(g)/M). The listings N-gram frequency ratio R(g) may be used toevaluate the relative importance of a particular listings N-gram g withrespect to its corresponding listings. For example, the listingsfrequency ratio R(“k_fl”)=4/5,743 (i.e., M(“k_fl”)/M). In anotherexample, the listings frequency ratio R(“ion)”)=116/5,743 (i.e.,M(“ion)”)/M. It should be noted that, the listings N-gram frequencyratio R(g) for a particular listings N-gram can also be evaluated bydividing the number of entries in the corresponding full or short listList(g) by the total number of entries in the listings database M.

[0040] The frequency ratio R(g) indicates the frequency by which aparticular N-gram g appears in entire database listings (i.e.,L(listings_(all))). A lower ratio indicates that the particular N-gram gappears less frequently in the entire database listings. The higherratio indicates that the particular N-gram g appears more frequently inthe entire database listings. The value of the frequency ratio R(g) canbe used to evaluate the “distinguishing power” of a particular N-gram g.Thus, the lower the frequency ratio R(g), the more distinguishing powerthe particular N-gram has.

[0041] During an input processing stage, the recognizer 120 may receivea users communication and may generate a list of recognized symbolstrings (S₁, S₂, S₃, . . . S_(k)) and associated first similarity scoresstrings (C₁, C₂, C₃, . . . C_(k)) from the recognizer 120. In this case,C₁ is the first similarity score associated with S₁, C₂ is the firstsimilarity score associated with S₂, etc. The recognizer forwards thelist of recognized symbol strings to the rough and fast matcher 220.

[0042] In an exemplary embodiment of the present invention, the roughand fast matcher 220 receives the list of recognized symbol strings andassociated first similarity scores. The rough and fast matcher 220 mayextract one or more recognized N-grams from each of the recognizedsymbol strings S₁ through S_(k), where N can be, for example, 1, 2, 3,4, 5, or more symbols or characters in length.

[0043] In an embodiment of the present invention, the value of N will bethe same length for the recognized symbol strings and the stored symbolstrings extracted by the N-gram map generator. In alternativeembodiments, the value of N may be different lengths for the recognizedsymbol strings than the value of N for the stored symbol stringsextracted by the N-gram map generator. The value of N may be fixed ormay vary in length. In other words, the value of N may be any fixedvalue such as 1, 2, 3, 4, 5 symbols, etc. Alternatively, the value of Nmay vary between 1 to 3, 1 to 4, 2 to 5, 3 to 4, 3 to 5, 3 to 6characters and/or any other range of values or any subset of any range,for example, the value of N may vary over 1,2, 4, 7 characters.

[0044] For every recognized symbol string S₁, one or more recognizedN-grams g₁ ¹, g₁ ² . . . g₁ ^(w(1)) are extracted, where g₁ ^(w(1)) isthe last N-gram extracted from symbol string S₁. Accordingly, forrecognized symbol string S_(k), one or more N-grams g_(k) ¹, g_(k) ², .. . g_(k) ^(w(k)) are extracted, where g_(k) ^(w(k)) is the last N-gramextracted from symbol string S_(k).

[0045] In an exemplary embodiment of the present invention, the roughand fast matcher 220 compares each of recognized N-grams g extractedfrom the recognized symbol strings with the listings N-grams g stored inthe N-gram database 240 to find one or more identical matches. For everydatabase entry stored in the listings database 110 (or every entry inthe one or more full or short lists List(g)) mapped to the matchedlistings N-gram, a second similarity score is created. The rough andfast matcher 220 may generate a preliminary set of symbol strings basedon the second similarity score associated with each entry in thepreliminary set of symbol strings. The preliminary set of symbol stringsmay include one or more symbol strings from the full or short listsList(g) that are mapped to the matched listings N-grams. It isrecognized that although full or short lists are described herein tonarrow down the field of searching that may be required during aprocessing stage, other techniques can be used by one of ordinary skillin the art to achieve the similar efficiencies. For example, the entiredatabase 110 or sub-sets of the database 110 can be created and/or usedto match the N-grams and generate the preliminary set of symbol strings.

[0046] In embodiments of the present invention, the second similarityscore for a listings symbol string from a preliminary set may becalculated based on elementary second similarity scores, for example, asa sum of the elementary second similarity scores for the recognized andmatched N-grams that appear in this listings symbol string. Inembodiments of the present invention, the frequency ratio R(g) may beused to generate an N-gram elementary second similarity score ESSS(g)for each of the recognized N-grams. The N-gram elementary secondsimilarity score ESSS(g) may be equal to the frequency ratio R(g), ormay be calculated as a function of the frequency ration R(g). Forexample, the N-gram elementary second similarity score ESSS(g) may becalculated as ·log(R(g)) or any other function of the ratio R(g). Thecalculation of ESSS(g)=·log(M(g)/M) can be referred to as theinformation theoretic importance measures which represents informationabout the listings from the whole listings database symbol stringscontained in N-gram g. In alternative embodiments of the presentinvention, the N-gram elementary second similarity score ESSS(g) foreach of the extracted N-grams may be a predetermined fixed value such as1, 2, 3, etc.

[0047] The N-gram elementary second similarity score ESSS(g) for eachmatched N-gram that belongs to the same stored symbol strings may beadded together to generate the second similarity scores. In embodimentsof the present invention, the N-gram elementary second similarity scoresESSS(g) for N-grams from different recognized symbol strings that belongto a particular stored symbol string may be multiplied by the firstsimilarity scores associated with the recognized symbol strings andadded up to obtain a second similarity score for this particular storedsymbol string.

[0048] In an embodiment of the present invention, rough and fast matchertakes as input a list of pairs {(S₁, C₁), . . . , (S_(k), C_(k))}, whereS_(i) is the symbol string, C_(i) is the first similarity scoreassociated with it. Then, for all different listings listing_(j) fromthe database, a second similarity score SSS2(listing_(j)) is computed.For example, the symbol strings S_(i) are scanned one by one, extractingall N-grams. For example, if, N-gram g is extracted from S_(i), then forevery listing listing_(j) from the full or short list List(g), thesecond similarity score SSS2(listing_(j)) is updated by addingelementary second similarity score ESSS(g) multiplied by the firstsimilarity score of the symbol string S_(i), that is C_(i):

[0049] SSS2(listing_(j))=SSS2(listing_(j))+ESSS(g)*C_(i.)

[0050] In embodiments of the present invention, the starting or initialvalues of second similarity scores may be set to the same value, forexample, 0. Alternatively the starting values of second similarityscores may be different for different listings symbol strings reflectingthe a priori information about the probabilities of all listings symbolstrings, thus giving some advantage to more probable listings symbolstrings.

[0051] Once corresponding second similarity scores have been establishedfor one or more stored listings symbol string, a threshold limit, forexample, may be established to determine the preliminary set of symbolstrings. For example, second similarity score threshold limit mayestablish that any stored listing symbol string having a correspondingsecond similarity score that meets or exceeds the threshold may beincluded in the preliminary set of symbol strings. However, any storedlistings symbol string having a corresponding second similarity scorethat does not meet or exceed the threshold may not be included in thepreliminary symbol string set. For example, if the second similarityscore threshold is set at 50 points (where 100 points is the highestsecond similarity score), any stored symbol string having a secondsimilarity score equal to or exceeding 50 points would be included inthe preliminary string set. On the other hand, if any stored symbolstring has a corresponding second similarity score that is less than,for example, 50 points, then the corresponding stored symbol string maynot be included in the preliminary symbol string set.

[0052] In embodiments of the present invention, the second similaritythreshold may be an absolute threshold or may relative threshold (e.g.,relative to the maximum value of the second similarity scores for thosestored symbol strings). In alternative embodiments of the presentinvention, other suitable methods may be used to determine which symbolstrings may be included in the preliminary set of symbol strings. Forexample, the symbol strings with the corresponding top ten (10) highestsecond similarity scores, for example, may be included in thepreliminary set of symbol strings.

[0053] In one embodiment of the present invention, the refined matcher230 receives the preliminary set of symbol strings from the rough andfast matcher 220 and may compute a third similarity score associatedwith the one or more symbol strings included in the preliminary set ofsymbol strings. The refined matcher 230 may also receive one or moreN-grams associated with entries included in the preliminary set ofsymbol strings. Based on the third similarity score, the refined matcher230 may output a refined set of symbol strings. The refined set ofsymbol strings may include the best or closest match for the user'scommunication originally received by the recognizer 120.

[0054] In embodiments of the present invention, the third similarityscore may be determined by evaluating the list of recognized symbolstrings with respect to the preliminary set of symbol strings includingthe one or more stored symbol strings. For each recognized N-gram, therefined matcher 230 may also calculate a refined N-gram frequency scorem(g), where g is any N-gram from the recognized symbol string. In otherwords, m(g) is the number of stored symbol strings included in thepreliminary set of symbol strings that contain, for example, recognizedN-gram g from the total number m of stored symbol strings included inthe preliminary set of symbol strings. The refined N-gram frequencyscore m(g) represents the number of listings in the preliminary set ofsymbol strings in which the recognized N-gram g appears.

[0055] Based on the refined N-gram frequency score m(g) and a number oflistings in the preliminary set of symbol strings m, the refined matcher230 may generate a refined N-gram frequency ratio r(g). The refinedfrequency ratio r(g) may be the number of stored symbol strings in thepreliminary set of symbol strings that contain the recognized N-gram gdivided by a total number of stored symbol strings that appear in thepreliminary set of symbol strings.

[0056] The refined matcher 230 may calculate the refined frequency ratior(g) as a ratio of the number of stored symbol strings in thepreliminary set of symbol strings that contain the recognized N-gram gand a total number of stored symbol strings that appear in thepreliminary set of symbol strings. The refined frequency ratio r(g) maybe calculated by dividing the frequency score m(g) by the total numberof database listings m (i.e., r(g)=m(g)/m).

[0057] The refined frequency ratio r(g) may be used to evaluate therelative importance of a particular N-gram g with respect to the storedlistings symbol string in the preliminary set of symbol strings. Therefined frequency ratio r(g) indicates the frequency by which aparticular N-gram g appears in the preliminary set of symbol strings. Alower ratio indicates that the particular N-gram g appears lessfrequently in the preliminary set of symbol strings. The higher ratioindicates that the particular N-gram g appears more frequently in thepreliminary set of symbol strings. The value of the refined frequencyratio r(g) can be used to evaluate the “distinguishing power” of aparticular N-gram g with respect to the preliminary set of symbolstrings. Thus, the lower the refined frequency ratio r(g), the moredistinguishing power the particular N-gram has.

[0058] In embodiments of the present invention, the third similarityscore for a stored listing listing_(j) TSS3(listing_(j)) from apreliminary set of symbol strings may be calculated based on elementarythird similarity scores ETSS(g), for example, as a sum of the elementarythird similarity scores ETSS(g) for the recognized and matched N-gramsthat appear in this listings symbol string. In embodiments of thepresent invention, the refined frequency ratio r(g) may be used togenerate an N-gram elementary third similarity score ETSS(g) for each ofthe recognized N-grams. The N-gram elementary third similarity scoreETSS(g) may be equal to the refined frequency ratio r(g), or may becalculated as a function of the refined frequency ratio r(g). Forexample, the elementary third similarity score for an N-gram g ETSS(g)may be calculated as ·log(r(g)) or some other function of the refinedratio r(g). The calculation of ETSS(g)=·log(m(g)/m) can be referred toas the information theoretic importance measures which representsinformation about the listings from the preliminary listings symbolstrings contained in N-gram g. In alternative embodiments of the presentinvention, the N-gram elementary third similarity score ETSS(g) for eachof the extracted N-grams may be a predetermined fixed value such as 1,2, 3, etc.

[0059] In an embodiment of the present invention, refined matcher takesas an input a list of pairs {(S₁, C₁), . . . (S_(k), C_(k))}, whereS_(j) is the symbol string and C_(i) is the first similarity scoreassociated with the symbol string. Then, for all different listingslisting_(j) from the preliminary set, the third similarity scoreTSS3(listing_(j)) is computed. For example, symbol strings S_(i) may bescanned one by one, extracting all N-grams. For example, if N-gram g isextracted from S_(i), then for every listing listing_(j) from the fullor short list List(g), the third similarity score TSS3(listing_(j)) isupdated by adding elementary third similarity score ETSS(g) multipliedby the first similarity score of the symbol string S_(i), that is C_(i):

[0060] TSS3(listing_(j))=TSS3(listing_(j))+ETSS(g)*C_(j).

[0061] In embodiments of the present invention, the starting values orinitial values of third similarity scores may be set to the same value,for example, 0. Alternatively the starting values of third similarityscores may be different for different listings symbol strings reflectingthe a priori information about the probabilities of all listings symbolstrings, thus giving some advantage to more probable listings symbolstrings.

[0062] Once corresponding third similarity scores have been establishedfor one or more stored symbol string listings, a threshold limit, forexample, may be established to determine the refined set of symbolstrings. In embodiments of the present invention, the third similarityscore threshold may be an absolute threshold or may relative threshold(e.g., relative to the maximum value of the third similarity scores). Inalternative embodiments of the present invention, other suitable methodsmay be used to determine which symbol strings may be included. in therefined set of symbol strings. For example, symbol strings with thecorresponding top ten (10) highest third similarity scores, for example,may be included in the refined set of symbol strings.

[0063] In further embodiments of the invention, further processing mayoccur by evaluating the recognized symbol strings with respect to therefined set of symbol strings before an output is presented to the userin the same way as it is done with the preliminary set of symbolstrings. Thus, a more refined set of symbol strings can be achieved.This process can be implemented repeatedly in accordance withembodiments of the present invention.

[0064] In embodiments of the present invention, the refined matcheroutputs the refined set of symbol strings to the output manager 130 forprocessing. Depending on the distribution of the third similarity scoresfor symbol strings from the refined set and/or some other similaritymeasures like Levenstein distance between the recognized symbol stringsand the symbol strings from the refined set, the output manager 130 maytake a decision about what listing the user meant and it may route acall for the user, retrieve and/or present the requested information.

[0065] Depending on the same distributions, the output manager 130 mayforward the refined set of symbol strings to the user for selection.Based on the user's selection, the output manager 130 may route a callfor the user, retrieve and present the requested information.

[0066] Depending on the same distributions, the output manager 130 maypresent another prompt to the user, terminate the call if the desiredresults have been achieved, or perform other steps to output a desiredresult for the user. If the output manager 130 presents another promptto the user, for example, asks the user to input the desired listingsname once more, the new recognized symbol strings may be used to helpthe output manager to make the final decision about the user's goal.This can be done by changing the distribution of the third similarityscores by, for example, adding up third similarity scores for symbolstrings from the refined set of symbol strings computed based on thefirst user input and third similarity scores for symbol strings from therefined set of symbol strings computed based the second user input.

[0067] It is recognized that the configuration of the communication(s)processing system 100 and the database entry matcher 140 as shown inFIGS. 1 and 2, and the corresponding description above, is given byexample only and modifications can be made to the communication(s)processing system 100 and to the database entry matcher 140 that fallwithin the spirit of the invention. For example, in alternativeembodiments of the invention, the database entry matcher 140 and/or itsfunctionality may be incorporated into the recognizer, the outputmanager and/or any combination(s) may be formed. In yet furtherembodiments of the present invention, the intelligence of thecommunication(s) processing system 100 may be integrated into one ormore application specific integrated circuits (ASICs) and/or one or moresoftware programs. In another example, the N-gram map generator 210, therough and fast matcher 220, and/or the refined matcher 230 may also becombined into one or more hardware or software components. Embodimentsof the present invention can be employed in known and/or new Internetsearch engines, for example, to search the World Wide Web.

[0068] Referring now to FIGS. 3 and 4, the method for automaticallyrecognizing a user's communication in accordance with exemplaryembodiments of the present invention will now be described. FIG. 4illustrates a flow chart of a method in accordance with an exemplaryembodiment of the present invention. FIG. 4 is an example of a user'scommunication that is processed in accordance with the method describedin the flow chart of FIG. 3. A user may call, for example, directoryassistance to locate the telephone number, address and/or otherinformation for a particular individual, organization, agency, business,etc. After the call is completed, an automated communication processingsystem 100, for example, may receive the call and request the user toenter a search criteria. The communication processing system 100 mayinclude an automated attendant, an IVR or other suitable automatedanswering service. The search criteria could be, for example, the nameof a business for which additional information is required. The searchcriteria could be a user's communication that can be spoken inputs,inputs entered via a keypad or keyboard, or other suitable inputs.

[0069] The recognizer 120 located in the communication processing system100 may receive a user's communication. As shown, in FIG. 4, forexample, the user's communication 401 may be a spoken request “DantesRestaurant” 402 in response to a request for the search criteria fromthe communication processing system 100. The recognizer 120 may searchthe recognizer grammar list 450 created from the recognizer listingsdatabase 453 for an N-best match. The recognizer grammar list 450 andthe listings database may be stored in, for example, the database 110.In this example, the listings database 453 contains a plurality ofsymbol strings representing, for example, names of local restaurants.The recognizer grammar list 450 include entries that represent thedifferent ways users may refer to the symbol strings stored in thelistings database 453.

[0070] In embodiments of the present invention, the N-gram map generator210 may extract one or more contiguous sequence of N-symbols (listingsN-grams) from the plurality of symbol strings stored in the listingsdatabase 453. The N-gram map generator may create an N-gram map 455containing the extracted contiguous sequences of N-symbols that aremapped to corresponding symbol strings stored in the listings database453. The N-gram map 455 may be stored in, for example, N-gram database240.

[0071] The recognizer 120 may generate a list of recognized symbolstrings 403 including one or more recognized entries 405 based on thereceived user's communication 402 and may compute the associated firstsimilarity scores 407 for the one or more recognized entries. Therecognizer 120 may transmit the list of recognized symbol string entries405 and the associated first similarity scores 407 to the database entrymatcher 140. The rough and fast matcher 220, located in the databaseentry matcher 140, receives the list of recognized symbol strings 405and the associated first similarity scores 407 (3070).

[0072] The rough and fast matcher 220 extracts from each of therecognized symbol strings 405, in the list of recognized symbol strings403, one or more contiguous sequences of N-symbols 411 (recognizedN-grams) (3090). The rough and fast matcher 220 may match at least oneof the extracted contiguous sequence of N-symbols 411 with the at leastone stored contiguous sequence of N-symbols from the N-gram map 455stored in the N-gram database 240 (3110). The rough and fast matcher 220further generates a preliminary set of symbol strings 413 based on theassociated second similarity scores 417 (3130). The preliminary set ofsymbol strings 413 may include one or more stored symbol strings fromthe listings database 453 that correspond to the matched contiguoussequence of N-symbols.

[0073] The refined matcher computes the third similarity scores 423associated with the one or more stored symbol strings 415 included inthe preliminary set of symbol strings 413 and outputs a refined set ofsymbol strings 421 (3150). The refined set of symbol strings 421 may beoutput based on the computed third similarity scores 423 (3170).

[0074] In embodiments of the present invention, the refined set ofsymbol strings 421 may be output to the output manager 130. In anexemplary embodiment, the output manager may directly output the refinedset of symbol strings 421 to the user for selection. In embodiments ofthe present invention, the output manager 130, for example, may retrieveadditional information corresponding to the user's selection and presentsuch information to the user. Such additional information may include,for example, a corresponding telephone number, mailing address, e-mailaddress, etc. This additional information may be located in, forexample, database 110 and/or any other informational database. Inalternative embodiments of the invention, the output manager may offerto connect the user with the selection if the user is satisfied with theresulting set of symbol strings presented. However, if the user isunsatisfied, the output manager 130 may return refined set of symbolstrings 421 to the refined matcher 230 and/or rough and fast matcher 230for further processing in accordance with embodiments of the presentinvention.

[0075] It is recognized that any suitable hardware, software, and/or anycombination thereof may be used to implement the above-describedembodiments of the present invention. The systems and/or apparatus shownin FIGS. 1-2 and described in corresponding text, and the methods shownin FIGS. 3-4 and described in corresponding text can be implementedusing hardware and/or software that are well within the knowledge andskill of persons of ordinary skill in the art.

[0076] Several embodiments of the present invention are specificallyillustrated and/or described herein. However, it will be appreciatedthat modifications and variations of the present invention are coveredby the above teachings and are within the purview of the appended claimswithout departing from the spirit and intended scope of the invention.

We claim:
 1. A method for processing a user's communication comprising:receiving a list of recognized symbol strings of one or more recognizedentries and a first similarity score associated with each recognizedentry; extracting from each recognized symbol string one or morecontiguous sequences of N-symbols; matching at least one of theextracted contiguous sequence of N-symbols with at least one storedcontiguous sequence of N-symbols from a first database; generating apreliminary set of symbol strings and associated second similarityscores, the preliminary set of symbol strings including one or morestored symbol strings from a second database that correspond to the atleast one of the matched contiguous sequence of N-symbols; computing athird similarity score associated with the one or more stored symbolstrings included in the preliminary set of symbol strings; andoutputting a refined set of symbol strings from the preliminary set ofsymbol strings based on the computed third similarity score.
 2. Themethod of claim 1, wherein the extracted contiguous sequences ofN-symbols and stored contiguous sequences of N-symbols include at leastfour symbols.
 3. The method of claim 2, wherein the extracted contiguoussequences of N-symbols and stored contiguous sequences of N-symbols areof a fixed length.
 4. The method of claim 1, wherein the extractedcontiguous sequences of N-symbols and stored contiguous sequences ofN-symbols include at least one of a one, two, three, four, five and sixsymbols.
 5. The method of claim 4, wherein the extracted contiguoussequences of N-symbols and stored contiguous sequences of N-symbols areof the same fixed length.
 6. The method of claim 1, wherein theextracted contiguous sequences of N-symbols and stored contiguoussequences of N-symbols range from at least three to six symbols.
 7. Themethod of claim 1, wherein the lengths of the extracted contiguoussequences of N-symbols and stored contiguous sequences of N-symbols areat least one of a fixed value, a range of values and a subset of therange of values.
 8. The method of claim 1, further comprising:generating from a received user's communication the list of recognizedsymbol strings of the one or more recognized entries; and computing thefirst similarity score associated with each recognized entry containedin the generated list of recognized symbol strings.
 9. The method ofclaim 1, wherein the third similarity score is computed based onassociated information theoretic importance measures.
 10. The method ofclaim 9, wherein the associated information theoretic importancemeasures is calculated using the formula: ·log(m(g)/m), where m(g)represents refined N-gram frequency scores and m represents number ofstored symbol strings included in the preliminary set of symbol strings.11. The method of claim 1, wherein the second similarity score iscomputed based on associated information theoretic importance measures.12. The method of claim 11, wherein the associated information theoreticimportance measures is calculated using the formula: ·log(M(g)/M), whereM(g) represents listings N-gram frequency scores and M represents totalnumber of stored symbol strings in the second database.
 13. The methodof claim 1, further comprising: extracting from the one or more symbolstrings stored in the second database the at least one stored contiguoussequence of N-symbols.
 14. The method of claim 13, further comprising:mapping the extracted at least one stored contiguous sequence ofN-symbols with corresponding one or more symbol strings stored in thesecond database.
 15. The method of claim 14, further comprising: storingmapping information relating at least one of the stored contiguoussequences of N-symbols to the corresponding one or more symbol stringsstored in the second database and the second similarity scoresassociating this particular stored contiguous sequence of N-symbols withthe corresponding symbol strings containing it.
 16. The method of claim1, further comprising: computing the associated second similarity scoresfor the one or more symbol strings stored in the second databaseincluded in the preliminary set of symbol strings as a function of atleast a number of the contiguous sequences of N-symbols from the list ofrecognized symbol strings of one or more recognized entries encounteredin the symbol string for which the associated second similarity score isbeing computed.
 17. The method of claim 1, further comprising: computingthe associated second similarity scores for the one or more symbolstrings stored in the second database included in the preliminary set ofsymbol strings based on at least a ratio of a number of the one or moresymbol strings stored in the second database that contain the matchedstored contiguous sequence of N-symbols and a total number of the one ormore symbol strings stored in the second database.
 18. The method ofclaim 1, further comprising: computing the associated third similarityscore as a function of at least a number of the one or more contiguoussequences of N-symbols extracted from each recognized symbol string thatappear in the one or more symbol strings stored in the second databaseincluded in the preliminary set of symbol strings.
 19. The method ofclaim 1, further comprising: computing the third similarity score basedon a ratio of a number of the one or more symbol strings stored in thesecond database included in the preliminary set of symbol stringscontaining the extracted contiguous sequences of N-symbols and a totalnumber of stored symbol strings that appear in the preliminary set ofsymbol strings.
 20. A machine-readable medium having stored thereon aplurality of executable instructions, the plurality of instructionscomprising instructions to: receive a list of recognized symbol stringsof one or more recognized entries and a first similarity scoreassociated with each recognized entry; extract from each recognizedsymbol string one or more contiguous sequences of N-symbols; match atleast one of the extracted contiguous sequence of N-symbols with atleast one stored contiguous sequence of N-symbols from a first database;generate a preliminary set of symbol strings and associated secondsimilarity scores, the preliminary set of symbol strings including oneor more stored symbol strings from a second database that correspond tothe at least one of the matched contiguous sequence of N-symbols;compute a third similarity score associated with the one or more storedsymbol strings included in the preliminary set of symbol strings; andoutput a refined set of symbol strings from the preliminary set ofsymbol strings based on the computed third similarity score.
 21. Themachine-readable medium of claim 20 having stored thereon additionalexecutable instructions, the additional instructions comprisinginstructions to: generate from a received user's communication the listof recognized symbol strings of the one or more recognized entries; andcompute the first similarity score associated with each recognized entrycontained in the generated list of recognized symbol strings.
 22. Themachine-readable medium of claim 20 having stored thereon additionalexecutable instructions, the additional instructions comprisinginstructions to: extract from the one or more symbol strings stored inthe second database the at least one stored contiguous sequence ofN-symbols.
 23. The machine-readable medium of claim 22 having storedthereon additional executable instructions, the additional instructionscomprising instructions to: map the extracted at least one storedcontiguous sequence of N-symbols with corresponding one or more symbolstrings stored in the second database.
 24. The machine-readable mediumof claim 23 having stored thereon additional executable instructions,the additional instructions comprising instructions to: storing mappinginformation relating at least one of the stored contiguous sequences ofN-symbols to the corresponding one or more symbol strings stored in thesecond database and the second similarity scores associating thisparticular stored contiguous sequence of N-symbols with thecorresponding symbol strings containing it.
 25. The machine-readablemedium of claim 20 having stored thereon additional executableinstructions, the additional instructions comprising instructions to:compute the associated second similarity scores for the one or moresymbol strings stored in the second database included in the preliminaryset of symbol strings as a function of at least a number of thecontiguous sequences of N-symbols from the list of recognized symbolstrings of one or more recognized entries encountered in the symbolstring for which the associated second similarity score is beingcomputed.
 26. The machine-readable medium of claim 20 having storedthereon additional executable instructions, the additional instructionscomprising instructions to: compute the associated second similarityscores for the one or more symbol strings stored in the second databaseincluded in the preliminary set of symbol strings based on at least aratio of a number of the one or more symbol strings stored in the seconddatabase that contain the matched stored contiguous sequence ofN-symbols and a total number of the one or more symbol strings stored inthe second database.
 27. The machine-readable medium of claim 20 havingstored thereon additional executable instructions, the additionalinstructions comprising instructions to: compute the associated thirdsimilarity score as a function of at least a number of the one or morecontiguous sequences of N-symbols extracted from each recognized symbolstring that appear in the one or more symbol strings stored in thesecond database included in the preliminary set of symbol strings. 28.The machine-readable medium of claim 20 having stored thereon additionalexecutable instructions, the additional instructions comprisinginstructions to: compute the third similarity score based on a ratio ofa number of the one or more symbol strings stored in the second databaseincluded in the preliminary set of symbol strings containing theextracted contiguous sequences of N-symbols and a total number of storedsymbol strings that appear in the preliminary set of symbol strings. 29.An apparatus for processing a user's communication comprising: an N-grammap generator to extract one or more contiguous sequences of N-symbolsfrom a list of recognized symbol strings of one or more recognizedentries; a first matcher to match at least one of the extractedcontiguous sequence of N-symbols with at least one stored contiguoussequence of N-symbols and the first matcher further generates apreliminary set of symbol strings and associated second similarityscores, the preliminary set of symbol strings including one or morestored symbol strings that correspond to the matched contiguous sequenceof N-symbols; a second matcher to compute a third similarity scorecorresponding to the one or more stored symbol strings included in thepreliminary set of symbol strings; and an output manager to output arefined set of symbol strings from the preliminary set of symbol stringsbased on the computed third similarity score.
 30. The apparatus of claim29, the further comprising: a first database to store a plurality ofsymbol string entries used to process the user's communication.
 31. Theapparatus of claim 30, wherein the N-gram map generator further extractsfrom each entry in the first database at least one stored contiguoussequence of N-symbols contained in each entry and the apparatus furthercomprises: a second database to store the at least one stored contiguoussequence of N-symbols and a mapping for the corresponding databaseentries.
 32. The apparatus of claim 30, wherein N-gram map generatorfurther maps one or more stored contiguous sequences of N-symbols to atleast one of the plurality of stored database entries that contain theone or more stored contiguous sequence of N-symbols.
 33. The apparatusof claim 29, wherein the first matcher is to compute the associatedsecond similarity scores for the preliminary set of symbol strings basedon at least a ratio of a number of the one or more stored symbol stringsfrom the first database that contain the stored contiguous sequence ofN-symbols and a total number of the one or more stored symbol strings inthe first database.
 34. The apparatus of claim 29, wherein the secondmatcher is to compute the associated third similarity scores for thepreliminary set of symbol strings based on at least a ratio of a numberof the one or more stored symbol strings included in the preliminary setof symbol strings containing the extracted contiguous sequence ofN-symbols and a total number of the one or more stored symbol stringsincluded in the preliminary set of symbol strings.
 35. The apparatus ofclaim 29, the further comprising: a recognizer to generate the list ofrecognized symbol strings of one or more recognized entries fromreceived user's communication and to compute a first similarity scoresassociated with each entry of the generated list of recognized symbolstrings.
 36. The apparatus of claim 29, wherein the extracted sequencesof N-symbols and the stored contiguous sequences of N-symbols include atleast four symbols.
 37. The apparatus of claim 36, wherein the extractedsequences of N-symbols and the stored contiguous sequences of N-symbolsare of a fixed length.
 38. The apparatus of claim 29, wherein theextracted sequences of N-symbols and the stored contiguous sequences ofN-symbols include at least one of a one, two, three, four, five and sixsymbols.
 39. The apparatus of claim 38, wherein the extracted sequencesof N-symbols and the stored contiguous sequences of N-symbols are of thesame fixed length.
 40. The apparatus of claim 29, wherein the lengths ofthe extracted contiguous sequences of N-symbols and stored contiguoussequences of N-symbols are at least one of a fixed value, a range ofvalues and a subset of the range of values.
 41. The method forprocessing a user's communication comprising: extracting one or morelistings N-gram from each symbol string entry in a listings database;mapping one or more particular listings N-gram from the one or morelistings N-gram with a list of listings symbol strings that contain theparticular listings N-gram; calculating an elementary second similarityscore for each entry in the list of listings symbol strings that containthe particular listings N-gram; receiving a list of recognized symbolstrings of one or more recognized entries and a first similarity scoreassociated with each recognized entry; extracting from each recognizedsymbol string one or more recognized N-grams; matching at least one ofthe recognized N-grams with at least one particular listings N-gram fromthe one or more particular listings N-gram mapped to the list oflistings symbol strings; generating a preliminary set of symbol stringsand associated second similarity scores, the preliminary set of symbolstrings including one or more symbol strings from the list of listingssymbol strings mapped to the at least one of the matched particularlistings N-gram; computing a third similarity score associated with theone or more symbol strings included in the preliminary set of symbolstrings; and outputting a refined set of symbol strings from thepreliminary set of symbol strings based on the computed third similarityscore.
 42. The method of claim 41, further comprising: calculating alistings N-gram frequency score for the one or more mapped particularlistings N-gram, wherein the listings N-gram frequency score representsa number of symbol string entries in the listings database in which theparticular N-gram appears.
 43. The method of claim 42, furthercomprising: calculating an listings N-gram frequency ratio for the oneor more mapped particular listings N-gram by dividing the listingsN-gram frequency score by a total number of symbol string entries in thelistings database.
 44. The method of claim 43, wherein the elementarysecond similarity score for each N-gram is based on the calculatedlistings N-gram frequency ratio and the associated second similarityscores for symbol strings are calculated based on correspondingelementary second similarity scores for N-grams from the recognizedsymbol strings contained in these symbol strings.
 45. The method ofclaim 41, wherein the preliminary set of symbol strings is generatedbased on an established threshold limit of the associated secondsimilarity scores.
 46. The method of claim 41, wherein the list oflistings symbol strings that contain the particular listings N-gram is afull list.
 47. The method of claim 41, wherein the list of listingssymbol strings that contain the particular listings N-gram is a shortlist.
 48. The method of claim 41, further comprising: calculating arefined frequency score for the one or more recognized N-grams, whereinthe refined frequency score represents a number of symbol string entriescontained in the preliminary set of symbol strings that contain therecognized N-gram.
 49. The method of claim 48, further comprising:calculating a refined N-gram frequency ratio for the one or morerecognized N-grams by dividing the refined frequency score by a totalnumber of symbol string entries contained in the preliminary set ofsymbol strings.
 50. The method of claim 49, further comprising:calculating an elementary third similarity score for a recognized N-gramand each entry in the preliminary set of symbol strings that contain therecognized N-gram.
 51. The method of claim 50, wherein the elementarythird similarity score for the recognized N-gram and each entry in thepreliminary list of symbol strings that contain the recognized N-gram isbased on the calculated refined N-gram frequency ratio and theassociated third similarity scores are calculated based on correspondingelementary third similarity scores.
 52. The method of claim 41, furthercomprising: generating a refined set of symbol strings is based on anestablished refined threshold limit of the associated third similarityscores.
 53. An apparatus for processing a user's communicationcomprising: an N-gram map generator extracts one or more listings N-gramfrom each symbol string entry in a listings database, maps one or moreparticular listings N-gram from the one or more listings N-gram with alist of listings symbol strings that contain the particular listingsN-gram and calculates an elementary second similarity score for eachentry in the list of listings symbol strings that contain the particularlistings N-gram; a first matcher receives a list of recognized symbolstrings of one or more recognized entries and a first similarity scoreassociated with each recognized entry, extracts from each recognizedsymbol string one or more recognized N-grams, matches at least one ofthe recognized N-grams with at least one particular listings N-gram fromthe one or more particular listings N-gram mapped to the list oflistings symbol strings and generates a preliminary set of symbolstrings and associated second similarity scores, the preliminary set ofsymbol strings including one or more symbol strings from the list oflistings symbol strings mapped to the at least one of the matchedparticular listings N-gram; and a second matcher computes a thirdsimilarity score associated with the one or more symbol strings includedin the preliminary set of symbol strings and outputs a refined set ofsymbol strings from the preliminary set of symbol strings based on thecomputed third similarity score.
 54. The apparatus of claim 53, theN-gram map generator further calculates a listings N-gram frequencyscore for the one or more mapped particular listings N-gram, wherein thelistings N-gram frequency score represents a number of symbol stringentries in the listings database in which the particular N-gram appears.55. The apparatus of claim 54, the N-gram map generator furthercalculates a listings N-gram frequency ratio for the one or more mappedparticular listings N-gram by dividing the listings N-gram frequencyscore by a total number of symbol string entries in the listingsdatabase.
 56. The apparatus of claim 53, further comprising: an N-gramdatabase stores the one or more listings N-gram and the mapped list oflistings symbol strings that contain the particular listings N-gram. 57.The apparatus of claim 53, the second matcher further calculates arefined frequency score for the one or more recognized N-grams, whereinthe refined frequency score represents a number of symbol string entriescontained in the preliminary set of symbol strings that contain therecognized N-gram.
 58. The apparatus of claim 57, the second matcherfurther calculates a refined N-gram frequency ratio for the one or morerecognized N-grams by dividing the refined frequency score by a totalnumber of symbol string entries contained in the preliminary set ofsymbol strings.
 59. The apparatus of claim 53, the second matcherfurther calculates an elementary third similarity score for a recognizedN-gram and each entry in the preliminary set of symbol strings thatcontain the recognized N-gram.
 60. A machine-readable medium havingstored thereon a plurality of executable instructions, the plurality ofinstructions comprising instructions to: extract one or more listingsN-gram from each symbol string entry in a listings database; map one ormore particular listings N-gram from the one or more listings N-gramwith a list of listings symbol strings that contain the particularlistings N-gram; calculate an elementary second similarity score foreach entry in the list of listings symbol strings that contain theparticular listings N-gram; receive a list of recognized symbol stringsof one or more recognized entries and a first similarity scoreassociated with each recognized entry; extract from each recognizedsymbol string one or more recognized N-grams; match at least one of therecognized N-grams with at least one particular listings N-gram from theone or more particular listings N-gram mapped to the list of listingssymbol strings; generate a preliminary set of symbol strings andassociated second similarity scores, the preliminary set of symbolstrings including one or more symbol strings from the list of listingssymbol strings mapped to the at least one of the matched particularlistings N-gram; compute a third similarity score associated with theone or more symbol strings included in the preliminary set of symbolstrings; and output a refined set of symbol strings from the preliminaryset of symbol strings based on the computed third similarity score. 61.The machine-readable medium of claim 60 having stored thereon additionalexecutable instructions, the additional instructions comprisinginstructions to: calculate a listings N-gram frequency score for the oneor more mapped particular listings N-gram, wherein the listings N-gramfrequency score represents a number of symbol string entries in thelistings database in which the particular N-gram appears.
 62. Themachine-readable medium of claim 61 having stored thereon additionalexecutable instructions, the additional instructions comprisinginstructions to: calculate a listings N-gram frequency ratio for the oneor more mapped particular listings N-gram by dividing the listingsN-gram frequency score by a total number of symbol string entries in thelistings database.
 63. The machine-readable medium of claim 60 havingstored thereon additional executable instructions, the additionalinstructions comprising instructions to: calculate a refined frequencyscore for the one or more recognized N-grams, wherein the refinedfrequency score represents a number of symbol string entries containedin the preliminary set of symbol strings that contain the recognizedN-gram.
 64. The machine-readable medium of claim 63 having storedthereon additional executable instructions, the additional instructionscomprising instructions to: calculate a refined N-gram frequency ratiofor the one or more recognized N-grams by dividing the refined frequencyscore by a total number of symbol string entries contained in thepreliminary set of symbol strings.
 65. The machine-readable medium ofclaim 64 having stored thereon additional executable instructions, theadditional instructions comprising instructions to: calculate anelementary third similarity score for a recognized N-gram and each entryin the preliminary set of symbol strings that contain the recognizedN-gram.
 66. The machine-readable medium of claim 60 having storedthereon additional executable instructions, the additional instructionscomprising instructions to: generate a refined set of symbol strings isbased on an established refined threshold limit of the associated thirdsimilarity scores.